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We describe and test an implicit solvent all- atom potential for simulations 
of protein folding and aggregation. The potential is developed through stud- 
ies of structural and thermodynamic properties of 17 peptides with diverse 
secondary structure. Results obtained using the final form of the poten- 
tial are presented for all these peptides. The same model, with unchanged 
parameters, is furthermore applied to a heterodimeric coiled-coil system, 
a mixed a//3 protein and a three-helix-bundle protein, with very good re- 
sults. The computational efficiency of the potential makes it possible to 
investigate the free-energy landscape of these 49-67-residue systems with 
high statistical accuracy, using only modest computational resources by to- 
day's standards. 



INTRODUCTION 

A molecular understanding of living systems requires modeling of the dynamics 
and interactions of proteins. The relevant dynamics of a protein may amount 
to small fluctuations about its native structure, or reorientations of its ordered 
parts relative to each other. In either case, a tiny fraction of the conformational 
space is explored. For flexible proteins, perhaps with large intrinsically disor- 
dered parts (1; 2), the situation is different. When studying such proteins or 
conformational conversion processes like folding or amyloid aggregation, the 
competition between different minima on the free-energy landscape inevitably 
comes into focus. Studying these systems by computer simulation is a challenge, 
because proper sampling of all relevant free-energy minima must be ensured. 
This goal is very hard to achieve if explicit solvent molecules are included in the 
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Figure 1: Schematic illustration of native geometries studied, (a) the Trp-cage, (b) an 
a-helix, (c) a ^-hairpin, (d) a three-stranded ^ -sheet, (e) an a-helix dimer 
(1U2U), (f) a three-helix bundle (1LQ7), and (g) amixed a/p protein (2GJH). 

simulations. The use of coarse-grained models can alleviate this problem, but 
makes important geometric properties like secondary structure formation more 
difficult to describe. 

Here we present an implicit solvent all-atom protein model especially aimed 
at problems requiring exploration of the global free-energy landscape. It is based 
on a computationally convenient effective potential, with parameters determined 
through full-scale thermodynamic simulations of a set of experimentally well 
characterized peptides. Central to the approach is the use of a single set of model 
parameters, independent of the protein studied. This constraint is a simple but 
efficient way to avoid unphysical biases, for example, toward either a-helical or 
P -sheet structure (3; 4) . Imposing this constraint is also a way to enable system- 
atic refinement of the potential. 

An earlier version (5; 6) of this potential has proven useful, for example, for 
studies of aggregation (7; 8; 9) and mechanical unfolding (10; 11). Also, using 
a slightly modified form of the potential (12), the folding mechanisms of a 49- 
residue protein, Top7-CFr, were investigated (13; 14). Here we revise this poten- 
tial, through studies of an enlarged set of 17 peptides (see Table 1 and Figure 0). 
We show that the model, in its final form, folds these different sequences to struc- 
tures similar to their experimental structures, using a single set of potential pa- 
rameters. The description of each peptide is kept brief, to be able to discuss all 
systems and thereby address the issue of transferability in a direct manner. The 
main purpose of this study is model development rather than detailed character- 
ization of individual systems. 

Whether or not this potential, calibrated using data on peptides with typically 
~ 20 residues, will be useful for larger systems is not obvious. Therefore, we also 
apply our potential, with unchanged parameters, to three larger systems with dif- 
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Table 1: Amino acid sequences. Sue stands for succinylic acid. 


System 


PDB code 


Sequence 


Trp-cage 


1L2Y 


NLYIQ WLKDG GPSSG RPPPS 


E6apnl 


IRIJ 


Ac-ALQEL LGQWL KDGGP SSGRP PPS-NH2 


C 




Ac-KETAA AKFER AHA-NH2 


EK 




Ac-YAEAA KAAEA AKAF-NH2 


Fs 




Suc-AAAAA AAARA AAARA AAARA A-NH2 


GCN4tp 


20VN 


NYHLE NEVAR LKKLV GE 


HPLC-6 


IWFA 


DTASD AAAAA ALTAA NAKAA AELTA ANAAA AAAAT AR-NH2 


Chignolin 


lUAO 


GYDPE TGTWG 


MBH12 


1J4M 


RGKWTYNGITYEGR 


GBlp 




GEWTY DDATK TFTVT E 


GBlm2 




GEWTY NPATG KFTVT E 


GBlmS 




KKWTY NPATG KFTVQ E 


trpzipl 


ILEO 


SWTWE GNKWT WK-NH2 


trpzip2 


ILEl 


SWTWE NGKWT WK-NH2 


betanova 




RGWSV QNGKY TNNGK TTEGR 


LLM 




RGWSL QNGKY TLNGK TMEGR 


betaSs 




TWIQN GSTKW YQNGS TKIYT 


AB zipper 


1U2U 


Ac-EVAQL EKEVA QLEAE NYQLE QEVAQ LEHEG-NH2 

Ac-EVQAL KKRVQ ALKAR NYALK QKVQA LRHKG-NH2 


Top7-CFR 


2GJH 


ERVRI SITAR TKKEA EKFAA ILIKV FAELG YNDIN 
VTWDG DTVTV EGQL 


GS-asW 


1LQ7 


GSRVK ALEEK VKALE EKVKA LGGGG RIEEL KKKWE 
ELKKK lEELG GGGEV KKVEE EVKKL EEEIK KL 
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ferent geometries. These systems are the mixed a/yS protein Top7-CFr, a three- 
helix-bundle protein with 67 residues, and a heterodimeric leucine zipper com- 
posed of two 30-residue chains. 

Protein folding simulations are by necessity based on potentials whose terms 
are interdependent and dependent on the choice of geometric representation. 
Therefore, we choose to calibrate our potential directly against folding proper- 
ties of whole chains. To make this feasible, we deliberately omit many details 
included in force fields like Amber, CHARMM and OPLS (for a review, see (15)). 
With this approach, we might lose details of a given free -energy minimum, but, 
by construction, we optimize the balance between competing minima. 

Two potentials somewhat similar in form to ours are the //-potential of the 
Shakhnovich group (16) and the PFF potential of the Wenzel group (17). These 
groups also consider properties of entire chains for calibration, but use folded 
PDB structures or sets of decoys rather than full-scale thermodynamic simu- 
lations. Our admittedly time-consuming procedure implies that our model is 
trained on completely general structures, which might be an advantage when 
studying the dynamics of folding. Another potential with similarities to ours is 
that developed by the Dokholyan group for discrete molecular dynamics simula- 
tions (18). 

METHODS 

Our model belongs to the class of implicit solvent all-atom models with torsional 
degrees of freedom. All geometrical parameters, like bond lengths and bond an- 
gles, are as described earlier (5). 

The interaction potential is composed of four major terms: 

^ = ^loc + ^ev + ^hb + ^sc • (1) 

The first term, ^loc, contains local interactions between atoms separated by only 
a few covalent bonds. The other three terms are non-local in character: Eev 
represents excluded-volume effects, E^h is a hydrogen-bond potential, and Esc 
contains residue-specific interactions between pairs of side-chains. Next we de- 
scribe the precise form of these four terms. Energy parameters are given in a 
unit called eu. The factor for conversion from eu to kcal/mol will be determined 
in the next section, by calibration against the experimental melting temperature 
for one of the peptides studied, the Trp-cage. 
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Local potential. The local potential iiioc = E^J^ + E^J^ + E^J^ can be divided into 
two backbone terms, and E^^^., and one side-chain term, E^^l. In describing 
the potential, the concept of a peptide unit is useful. A peptide unit consists of 
the backbone CO group of one residue and the backbone NH group of the next 
residue. 

• The potential E^^l represents interactions between partial charges of neigh- 
boring peptide units along the chain. It is given by 



where the outer sum runs over all pairs of nearest-neighbor peptide units 
and each of the two inner sums runs over atoms in one peptide unit (if the 
N side of the peptide unit is proline the sum runs over only C and O). The 
partial charge qi is taken as ±0.42 for C and O atoms and ±0.20 for H and N 
atoms. The parameter k\^^ is set to 6 eu, corresponding to a dielectric con- 
stant of er f«41. Two peptide units that are not nearest neighbors along the 
chain interact through hydrogen bonding (see below) rather than through 
the potential E^^l. 

• The term eI^I provides an additional 00 and HH repulsion for neighbor- 
ing peptide units, unless the residue flanked by the two peptide units is a 
glycine. This repulsion is added to make doubling of hydrogen bonds less 
likely. Glycine has markedly different backbone energetics compared to 
other residues. The lack of Cp atom makes glycine more flexible. However, 
the observed distribution of Ramachandran ^, tp angles for glycine in PDB 
structures (19) is not as broad as simple steric considerations would sug- 
gest. eI^I provides an energy penalty for glycine xp values around ±120°, 
which are sterically allowed but relatively rare in PDB structures. 

The full expression for E^^l is 




(2) 




X [/(i^/) + /(i^/)]+^c,GS^cosV^/ + 2cos2V^,), (3) 



non-Gly Gly 



where k[J^ =1.2 eu, k'iocg = —0.15 eu, / is a residue index, and 

ui = min[d(Hj, N7+1), d(N/, Hj+O] - rf(H/, H7+1) 
vi = mm[d(0,, C'l+i), d(C'i, O/+1)] - d(0/, Oj+i) 



(4) 



(5) 



/(x) = max(0, tanhSx) 



(6) 
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Table 2: Classification of side-chain angles, Xi- The parameters of the torsion angle po- 




f (3) 


rii) = (0.6 eu, 3) for class I, (7f|^^ ., nt] = (0.3 eu, 3) for class II, 


r (3) ^ 


= (0.4 eu, 2) for class III, and [k^^^^^, m) = (-0.4 eu, 2) for class IV. 


Residue 




X3 Xi 


Ser, Cys, Thr, Val 






He, Leu 


I I 




Asp, Asn 


I IV 




His, Pile, Tyr, Trp 


I III 




Met 


I I 


II 


Glu, Gin 


I I 


IV 


Lys 


I I 


I I 


Arg 


I I 


I III 



The function f{ui) is positive if the H/H/+i distance, d(H/, H/+i), is smaller 
than both of the H/N/+i and N/H/+i distances, and zero otherwise. This 
term thus provides an energy penalty when H/ and H/+i are exposed to 
each other (it is omitted if residue 7 or /+ 1 is a proline). Similarly, f[vi] is 
positive when 0/ and 0/+i are exposed to each other. 

• E^^l is an explicit torsion angle potential for side-chain angles, j/. Many 
side-chain angles display distributions resembling what one would expect 
based on simple steric considerations. The use of the torsion potential is 
particularly relevant for ^2 in asparagine and aspartic acid and in glu- 
tamine and glutamic acid. The torsion potential is defined as 

i 

where k:[^|. . and are constants. Each side-chain angle Xi belongs to one 
of four classes associated with different values of k^^I j and (see Table 2). 



Excluded volume. Excluded- volume effects are modeled using the potential 



(8) 



where the summation is over all pairs of atoms with a non-constant separation, 
KTev^O.lOeu, and(7, =1.77, 1.75, 1.53, 1.42 and 1.00 A for S, C, N, O and H atoms. 
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respectively. The parameter A,-j is unity for pairs connected by three covalent 
bonds and A,j =0.75 for all other pairs. To speed up the calculations, Eev is eval- 
uated using a cutoff of 4.3A,y A. 

Hydrogen bonding. Our potential contains an explicit hydrogen-bond term, 
Ehb- All hydrogen bonds in the model are between NH and CO groups. They 
connect either two backbone groups or a charged side-chain (aspartic acid, glu- 
tamic acid, lysine, arginine) with a backbone group. Two neighboring peptide 
units, which interact through the local potential (see above), are not allowed to 
hydrogen bond with each other. 

The form of the hydrogen-bond potential is 

bb-bb sc-bb 

where ej,'^ =3.0 eu and e|f^ =2.3 eu set the strengths of backbone-backbone and 
sidechain-backbone bonds, respectively, r,; is the HO distance, atj is the NHO 
angle, and Pij is the HOC angle. The functions u{r) and v{a, P) are given by 

/- ^ N 12 /- ^ N 10 

„{r) = 5[^] -6^^] (10) 

.(a./3)=l (11) 
otherwise 



where (Thb = 2.0 A. A 4.5 A cutoff is used for u{r). 



Side-chain potential. Our side-chain potential is composed of two terms. Esc = 
Ehp + Ech- The Ech term represents interactions among side-chain charges. The 
first and more important term, ijhp, is meant to capture the effects of all other rel- 
evant interactions, especially effective hydrophobic attraction. For convenience, 
Ehp and ^ch have a similar form, 

Eb, = -^Mf/'cf/' £eh = . (12) 

KJ KJ 

Here the sums run over residue pairs / /, cff^ and Cfj^^ are contact measures that 
take values between and 1, and M^f^ and Mf^^ are energy parameters. 

It is assumed that ten of the twenty natural amino acids contribute to E^p, see 
Table 3. Included among these ten are lysine and arginine, which are charged 
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Table 3: The parameter mj of the hydrophobicity potential ^hp- 



Residue mi (eu) 



Arg 


0.3 


Met, Lys 


0.4 


Val 


0.6 


He, Leu, Pro 


0.8 


Tyr 


1.1 


Phe, Trp 


1.6 



but have large hydrophobic parts. To reduce the number of parameters, the hy- 
drophobic contact energies are taken to be additive, M*^' = mi + mj.lt is known 
that the statistically derived Miyazawa-Jernigan contact matrix (20) can be ap- 
proximately decomposed this way (21). The rui parameters can be found in Ta- 
ble 3. M*^' is set to if residues / and / are nearest neighbors along the chain, 
and is reduced by a factor 2 for next-nearest neighbors. 

The residues taken as charged are aspartic acid, glutamic acid, lysine and argi- 
nine. The charge-charge contact energy is —Mff^ = 1.55/5/ eu, where 5/ and Sj 
are the signs of the charges (±1). 

The contact measure cff ^ is calculated using a predetermined set of atoms for 
each amino acid, denoted by A*^' (see Table 4). Let rii be the number of atoms 
in Af" and let 

ry=Z^(mmrJ). (13) 

where g(x) is unity for x < (3.7A)2, vanishes for x > [A.SAf, and varies linearly for 
intermediate x. The contact measure can then be written as 

,,hp, _ min(n;(n, + nAry+ry) 
TiAni + nj] 

where Yu is either 1 or 0.75. For yij = 1, cff^ is, roughly speaking, the fraction 
of atoms in A*^' and A*''' that are in contact with some atom from the other of 
the two sets. A reduction to yu = 0.75 makes it easier to achieve a fuU contact 



(C*^' = 1). The value yu — 0.75 is used for interactions within the group proline, 
phenylalanine, tyrosine and tryptophan, to make face-to-face stacking of these 
side-chains less likely. It is also used within the group isoleucine, leucine and 
valine, because a full contact is otherwise hard to achieve for these pairs. In all 
other cases, yi j is unity. 
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Table 4: Atoms used in the calculation of the contact measure C 



Residue Set of atoms (/!/) 



Pro 

Tyr 

Val 

lie 

Leu 

Met 

Phe 

Trp 

Arg 

Lys 


Cp, Cj-, Cg 

Cy, Cgi, C52, Gel, Ce2» 

C/3, Cj-i, C^2 

Cp, Cyi, Cj2, C5 
Cfi, Cj, C51, C52 
C^, Cj-, S5, 

Cj, C51, C52. Cel, Ce2, 
Cj, C51, C52, Ce3, Cf3, C,j2 


Table 5: Atoms used in the calculation of the contact measure Cf^. 


Residue 


Set of atoms (A/) 


Arg 
Lys 
Asp 
Glu 


Ne, Cr, N^i, N^2 

O51, O52 
Oel, Oe2 



The definition of Cfj is similar. The ju parameter is unity for charge-charge 
interactions, and the sets of atoms used, ylf^', can be found in Table 5. 

Chain ends. Some of the sequences we study have extra groups attached at 
one or both ends of the chain. The groups occurring are N-terminal acetyl and 
succinylic acid, and C-terminal NH2. When such a unit is present, the model as- 
sumes polar NH and CO groups beyond the last C^ atom to hydrogen bond like 
backbone NH / CO groups but with the strength reduced by a factor 2 (multiplica- 
tively). The charged group of succinylic acid interacts like a charged side-chain. 

In the absence of end groups, the model assumes the N and C termini to be 
positively and negatively charged, respectively, and to interact like charged side- 
chains. 

Monte Carlo details. We investigate the folding thermodynamics of this model 
by Monte Carlo (MC) methods. The simulations are done using either simulated 
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Table 6: Algorithm used and total number of elementary MC steps for all systems stud- 
ied. 



System Method MC steps 



Trp-cage, E6apnl 


ST 


10 X 1.0 X 109 


C, EK, Fs, GCN4tp 


ST 


10 X 1.0 X 109 


HPLC-6 


ST 


lOxS.Ox IQS 


Chignolin 


ST 


lOxO.Sx 10^ 


MBH12 


ST 


lOx l.Ox 109 


GBlp 


ST 


10 X 2.0 X 109 


GBlm2, GBlmS 


ST 


10 X l.Ox 109 


trpzipl, trpzip2 


ST 


10 X l.Ox 109 


betanova, LLM 


ST 


10 X l.Ox 109 


betaSs 


ST 


10 X 2.0 X 109 


AB zipper 


PT 


64 X 3.0 X 109 


Top7-CFR 


FT 


64 X 2.4 X 109 


GS-asW 


FT 


64 X 3.5 X 109 



tempering (ST) (22; 23) or parallel tempering/replica exchange (PT) (24; 25), both 
with temperature as a dynamical variable. For small systems we use ST, with 
seven geometrically distributed temperatures in the range 279 K-367 K. For each 
system, ten independent ST runs are performed. For our largest systems we use 
PT with a set of sixteen temperatures, spanning the same interval. Using fourfold 
multiplexing (26), one run comprising 64 parallel trajectories is performed for 
each system. The PT temperature distribution is determined by an optimization 
procedure (26). The length of our different simulations can be found in Table 6. 

Three different conformational updates are used in the simulations: single 
variable updates of side-chain and backbone angles, respectively, and Biased 
Gaussian Steps (BGS) (27). The BGS move is semi-local and updates up to eight 
consecutive backbone degrees of freedom in a manner that keeps the ends of the 
segment approximately fixed. The ratio of side-chain to backbone updates is the 
same at all temperatures, whereas the relative frequency of the two backbone 
updates depends on the temperature. At high temperatures the single variable 
update is the only backbone update used, and at low temperatures only BGS is 
used. At intermediate temperatures both updates are used. 

The AB zipper, a two-chain system, is studied using a periodic box of size 
(158A)^. In addition to the conformational updates described above, the sim- 
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ulations of this system used rigid body translations and rotations of individual 
chains. 

Our simulations are performed using the open source C++-package PRO FAST (28) . 
Future public releases of PROFASI will include an implementation of the force 
field described here. While this force field has been implemented in PROFASI 
in an optimized manner, this optimization does not involve a parallel evaluation 
of the potential on many processors. Therefore, in our simulations the num- 
ber of processors used is the same as the number of MC trajectories generated. 
For a typical small peptide, a trajectory of the length as given in Table 6 takes 
~ 18 hours to generate on an AMD Opteron processor with ~ 2.0 GHz clock rate. 
For the largest system studied, GS-aaW, the simulations, with a proportionately 
larger number of MC updates, take ~ 10 days to complete. 

Analysis. In our simulations, we monitor a variety of different properties. Three 
important observables are as follows. 

1. a-helix content, h. A residue is defined as helical if its Ramachandran an- 
gle pair is in the region -90° < (j)< -30°, -77° <ip < -17°. Following (29), 
a stretch of n > 2 helical residues is said to form a helical segment of length 
n-2. For an end residue that is not followed by an extra end group, the 
{(j),xp) pair is poorly defined. Thus, for a chain with N residues, the max- 
imum length of a helical segment is iV — 4, iV — 3 or — 2, depending on 
whether there are zero, one or two end groups. The a-helix content h is 
defined as the total length of all helical segments divided by this maximum 
length. 

2. Root-mean-square deviation from a folded reference structure, bRMSD/ 
RMSD/pRMSD. bRMSD is calculated over backbone atoms, whereas RMSD 
is calculated over all heavy atoms. All residues except the two end residues 
are included in the calculation, unless otherwise stated. For the case of the 
dimeric AB zipper, the periodic box used for the simulations has to be taken 
into account. The two chains in the simulation might superficially appear 
to be far away when they are in fact close, because of periodicity. For this 
case we evaluate backbone RMSD over atoms taken from both chains in 
the dimer, and minimize this value with respect to periodic translations. 
We denote this as pRMSD. 
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3. Nativeness measure based on hydrogen bonds, q^h- This observable has 
the value 1 if at most two native backbone-backbone hydrogen bonds are 
missing, and is otherwise. A hydrogen bond is considered formed if its 
energy is less than - 1.03 eu. 

In many cases, it turns out that the temperature dependence of our results can 
be approximately described in terms of the simple two-state model 



where X{ T) is the quantity studied, Xi and X2 are the values of X in the two states, 
and K{T] is the effective equilibrium constant [R is the gas constant). In this 
first-order form, K{T] contains two parameters: the melting temperature Tm and 
the energy difference AE. The parameters r^, AE, Xi and X2 are determined by 
fitting to data. 

Thermal averages and their statistical errors are calculated by using the jack- 
knife method (30), after discarding the first 20 % of each MC trajectory for ther- 
malization. 

Figures of 3D structures were prepared using PyMOL (31). 



We study a total of 20 peptide/protein systems, listed in Table 1 (amino acid se- 
quences can be found in this table). Among these, there are 17 smaller systems 
with 10-37 residues and 3 larger ones with > 49 residues. Many of the smaller 
systems have been simulated by other groups, in some cases with explicit water 
(for a review, see (32)). Two of the three larger systems, as far as we know, have 
not been studied using other force fields. A study of the 67-residue three-helix- 
bundle protein GS-aaW using the ECEPP/3 force field was recendy reported (33) . 
The simulations presented here use the same geometric representation and find 
about a hundred times the number of independent folding events, while con- 
suming much smaller computing resources. 

Trp-cage and E6apnl. The Trp-cage is a designed 20-residue miniprotein with 
a compact helical structure (34). Its NMR-derived native structure (see Figure 0) 
contciins an a-helix and a single turn of 3io-helix (34). The E6apnl peptide was 




(15) 



RESULTS 
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designed using the Trp-cage motif as a scaffold, to inhibit the E6 protein of pa- 
pillomavirus (35). E6apnl is three residues larger than the Trp-cage but has a 
similar structure, except that the a-helix is slightly longer (35). 

As indicated earlier, we use melting data for the Trp-cage to set the energy 
scale of the model. For this peptide, several experiments found a similar melt- 
ing temperature, Tm ~ 315 K (34; 36; 37). In our model, the heat capacity of the 
Trp-cage displays a maximum at RT = 0.4722 ± 0.0008 eu. Our energy unit eu 
is converted to kcal/mol by setting this temperature equal to the experimental 
melting temperature (315 K). Having done that, there is no free parameter left in 
the model. Other systems are thus studied without tuning any model parameter. 
For E6apnl, the experimental melting temperature is Tra ~ 305 K (35). 

Figure la shows the helix content h against temperature for the Trp-cage and 
E6apnl, as obtained from our simulations. In both cases, the T dependence is 
well described by the simple two-state model of Equation 15. The fitted melting 
temperatures are = 309.6 ± 0.7 K and = 304.0 ± 0.5 K for the Trp-cage and 
E6apnl, respectively. This value for the Trp-cage is slightly lower than that 
we obtain from heat capacity data, 315 K. A fit to our data for the hydrophobic- 
ity energy Ehp (not shown) gives instead a slightly larger T^, 321.1 ± 0.8 K. This 
probe dependence of implies an uncertainty in the determination of the en- 
ergy scale. By using the Trp-cage, this uncertainty is kept small (~ 2 %). For many 
other peptides, the spread in Tm is much larger (see below). 

Figure lb shows the free energy calculated as a function of bRMSD for the Trp- 
cage and E6apnl at two different temperatures. The first temperature, 279 K, is 
well below Tm. Here native-like conformations dominate and the global free- 
energy minima are at 2.4A and 2.0 A for the Trp-cage and E6apnl, respectively. 
At the second temperature, 306 K, the minima are shifted to higher bRMSD. Note 
that these free-energy profiles, taken near Tm, show no sign of a double-well 
structure. Hence, these peptides do not show a genuine two-state behavior in 
our simulations, even though the melting curves (Figure la) are well described 
by a two-state model, as are many experimentally observed melting curves. 

The a-helices C, EK, Fj, GCN4tp and HPLC-6. Our next five sequences form 
a-helices. Among these, there are large differences in helix stability, according to 
CD studies. The least stable are the C (38) and EK (39) peptides, which are only 
partially stable at T ~ 273 K. The original C peptide is a 13-residue fragment of 
ribonuclease A, but the C peptide here is an analogue with two alanine substi- 
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I I I I I I '. I 2 ' ' ' ' ' ' ' 



280 300 320 340 360 380 2 4 6 8 10 12 

r(K) bRMSDCA) 

Figure 2: The Trp-cage and E6apnl. (a) Helix content h against temperature. The lines 
are two-state fits (Tm = 309.6 ± 0.7 K and A£' = 1 1.3 ± 0.3 kcal/mol for the Trp- 
cage; rm = 304.0±0.5KandA£'= 14.2 ±0.3 kcal/mol for E6apnl). (b) Free en- 
ergy F calculated as a function of bRMSD at two different temperatures, 279 K 
(solid lines) and 306 K (dashed lines). The double lines indicate the statistical 
errors. 

tutions and a slightly increased helix stability (40). The EK peptide is a designed 
alanine-based peptide with 14 residues. 

Our third a-helix peptide is the 21 -residue Fg (41), which is also alanine-based. 
Fs is more stable than C and EK (41; 42), with estimated values of 308 K (42) 
and 303 K (43) from CD studies and 334 K from an IR study (44). Even more stable 
is HPLC-6, a winter flounder antifreeze peptide with 37 residues. CD data suggest 
that the helix content of HPLC-6 remains non-negligible, ~ 0. 10, at temperatures 
as high as ~ 343 K (45). Our fifth helix- forming sequence, which we call GCN4tp, 
has 17 residues and is taken from a study of GCN4 coiled-coil formation (46). Its 
melting behavior has not been studied, as far as we know, but its structure was 
characterized by NMR (46). 

These five peptides are indeed a-helical in our model. At 279 K, the calculated 
helix content h is 0.28 for the C peptide, 0.47 for the EK peptide, and > 0.60 for 
the other three peptides. Figure 2 shows the temperature dependence of h. By 
fitting Equation 15 to the data for the three stable sequences, we find melting 
temperatures of 298.9 ± 0. 1 K, 309.2 ± 0.3 K and 323.3 ± 1.2 K for GCN4tp, Fs and 
HPLC-6, respectively. 

For the four peptides whose melting behavior has been studied experimen- 
tally, these results are in good agreement with experimental data. In particular, 
we find that HPLC-6 indeed is more stable than Eg in the model, which in turn 
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Figure 3: The C, EK, Fg, GCN4tp and HPLC-6 peptides. Helix content h against tempera- 
ture. The lines are two-state fits ( = 276.3±2.4 K and AE = 1 1.7±0.4 kcal/mol 
forC; Tm = 293.9 ± 0.4 K and A£= 12.6 ±0.2 kcal/mol for EK; 7^ = 309.2 ± 0.3 K 
and AE = 18.7 ± 0.4 kcal/mol for Fs; = 298.9 ± 0.1 K and AE = 14.1 ± 
0.1 kcal/mol for GCN4tp; = 323.3 ± 1.2 K and AE = 23.6 ±2.2 kcal/mol for 
HPLC-6). 

is more stable than both C and EK. The model thus captures the stability order 
among these peptides. 

The P -hairpins chignolin and MBH12. We now turn to ^-sheet peptides and 
begin with the -hairpins chignolin (47) and MBH12 (48) with 10 and 14 residues, 
respectively. Both are designed and have been characterized by NMR. For chig- 
nolin, Tm values in the range 311-315 K were reported (47), based on CD and 
NMR. We are not aware of any melting data for MBH12. 

Figure 3 shows the temperature dependence of the hydrophobicity energy Ehp 
and the nativeness parameter q^b for these peptides. By fitting to Ehp data, we 
obtain = 311.0 ± 0.5 K and = 315.4 ± 1.3 K for chignolin and MBH12, re- 
spectively. Using qhh data instead, we find Tm = 305.4 ± 0.5 K for chignolin and 
Tm = 309.2 ± 0.7 K for MBH12. These values show a significant but relatively 
weak probe dependence. The values for chignolin can be compared with experi- 
mental data, and the agreement is good. 

Because these peptides have only four native hydrogen bonds each, one may 
question our definition of qhb (see Methods) , which takes a conformation as native- 
like (<7hb = 1) even if two hydrogen bonds are missing. Therefore, we repeated the 
analysis using the stricter criterion that native-like conformations iqhh = 1) may 
lack at most one hydrogen bond. The resulting decrease in native population, as 
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Figure 4: Chignolin and MBH12. (a) Hydrophobicity energy Ehp against temperature. 

The lines are two-state fits (Tm = 311.0 ± 0.5 K and AE = 9.6 ± 0.2 kcal/mol for 
chignolin; = 315.4 ± 1.3 K and AE = 9.9 ± 0.9 kcal/mol for MBH12). (b) Na- 
tiveness qhh against temperature. The lines are two-state fits (Tm = 305.4±0.5 K 
and AE = 10.4 ± 0.1 kcal/mol for chignolin; Tm = 309.2 ± 0.7 K and AE = 
13.5 ± 0.2 kcal/mol for MBH12). 

measured by the average q^^,, was ~ 0.1 or smaller at all temperatures. Even with 
this stricter definition, we find native populations well above 0.5 at low tempera- 
tures for both peptides. 

The P -hairpins GBlp, GBlm2 and GBlmS. GBlp is the second {3 -hairpin of 
the Bl domain of protein G (residues 41-56). Its folded population has been es- 
timated by CD/NMR to be 0.42 at 278 K (49) and ~ 0.30 at 298 K (50), whereas a 
Trp fluorescence study found a Tm of 297 K (51), corresponding to a somewhat 
higher folded population. GBlm2 and GBlmS are two mutants of GBlp with 
significantly enhanced stability (50). At 298 K, the folded population was found 
to be 0.74 ± 0.05 for GBlm2 and 0.86 ± 0.03 for GBlmS, based on CD and NMR 
measurements (50). It was further estimated that Tm = 320 ± 2 K for GBlm2 and 
= 333 ± 2 K for GB lm3 (50) . 

All these three peptides are believed to adopt a structure similar to that GBlp 
has as part of the protein G Bl domain (PDB code IGBl). This part of the full 
protein contains seven backbone-backbone hydrogen bonds. These hydrogen 
bonds are the ones we consider when evaluating qhh for these peptides. 

Figure 4 shows the observables Ehp and qhh against temperature for these pep- 
tides. Fits to the data give Ehp-based values of 301.7 ±3.3 K, 324.4 ± 1.1 K and 
331.4±0.7K for GBlp, GBlm2 andGBlm3, respectively, and <7hb-based Tm values 
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Figures: GBlp, GBlm2 and GBlmS. (a) Hydrophobicity energy Ehp against tempera- 
ture. The lines are two-state fits (Tm = 301.7±3.3 Kand A£' = 11.3±1.1 kcal/mol 
for GBlp; = 324.4 ± 1.4 K and AE = 13.2 ± 1.0 kcal/moI for GBlm2; Tm = 
331.4 ± 0.7 K and AE = 14.8 ± 0.5 kcal/mol for GBlm3). (b) Nativeness cfhh 
against temperature. The lines are two-state fits (Tm = 307.5 ± 0.5 K and AE = 
20.7 ±0.5 kcal/mol for GBlm2; Tm = 313.9± 1.4K and A£ = 21.4± 1.1 kcal/mol 
forGBlmS). 

of 307.5±0.5 K and 313.9±1.4 Kfor GBlm2 and GBlmS, respectively. The qtb data 
do not permit a reliable fit for the less stable GBlp. At 298 K, we find <7hb-based 
folded populations of 0.20, 0.64 and 0.74 for GBlp, GBlm2 and GBlmS, respec- 
tively, which can be compared with the above-mentioned experimental results 
(0.30, 0.74 and 0.86). 

These results show that, in the model, the apparent folded populations of these 
peptides depend quite strongly on the observable studied. Our Ehp -based re- 
sults agree quite well with experimental data, especially for GBlm2 and GBlmS, 
whereas our qhh results consistendy give lower folded populations for all pep- 
tides. The stability order is the same independent of which of the two observ- 
ables we study, namely GBlp <GBlm2< GBlmS, which is the experimentally 
observed order. 

The stability difference between GBlm2 and GBlmS is mainly due to charge- 
charge interactions. In our previous model (6), these interactions were ignored, 
and both peptides had similar stabilities. The present model splits this degener- 
acy. Moreover, the magnitude of the splitting, which sensitively depends on the 
strength of the charge-charge interactions, is consistent with experimental data. 
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Figure 6: Trpzipl and trpzip2. (a) Hydrophobicity energy E^p against temperature. The 
lines are two-state fits (Tm = 319.7 ± 0.2 K and AE = 7.9 ±0.1 kcal/mol for tr- 
pzipl; Tm = 327.1 ± 0.8 K and AE = 8.3 ± 0.4 kcal/mol for trpzip2). (b) Native- 
ness qhh against temperature. The lines are two-state fits (Tm = 303.2 ± 1.8 K 
and AE = 14.1 ±0.5 kcal/mol for trpzipl; 7^ = 305.0 ± 1.1 K and AE = 12.6 ± 
0.3 kcal/mol for trpzip2). 

The ^-hairpins trpzipl and trpzipZ. The 12-residue trpzipl and trpzip2 are 
designed -hairpins, each containing two tryptophans per ^-strand (52). The 
only difference between the two sequences is a transposition of an asparagine 
and a glycine in the hairpin turn. CD measurements suggest that trpzipl and 
trpzip2 are remarkably stable for their size, with Tm values of 323 K and 345 K, 
respectively (52). A complementary trpzip2 study, using both experimental and 
computational methods, found Tm values to be strongly probe -dependent (53). 

Figure 5 shows our melting curves for these peptides, based on the observables 
Ehp and q^h- The ^hp -based Tm values are 319.7±0.2 K and 327.1±0.8 Kfor trpzipl 
and trpzip2, respectively. Using qhb data instead, we find Tm = 303.2 ± 1.1 K for 
trpzipl and = 305.0 ± 1.1 K for trpzip2. 

Like for the other ^-hairpins discussed earlier, our <7hb -based folded popula- 
tions are low compared to estimates based on CD data, whereas those based on 
Ehp are much closer to experimental data. For trpzip2, the agreement is not per- 
fect but acceptable, given that has been found to be strongly probe -dependent 
for this peptide (53). 

Three-stranded /3 -sheets: betanova, LLM and betaSs. Betanova (54), the be- 
tanova triple mutant LLM (55) and beta3s (56) are designed 20-residue peptides 
forming three-stranded )8 -sheets. All the three peptides are marginally stable. 
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Figure 7: Betanova, LLM and betaSs. (a) Hydrophobicity energy E^p against tempera- 
ture. The lines are two-state fits (Tm = 318.8±2.5 Kand = 13.3±2.1 kcal/mol 
for betanova; = 305.6 ± 1.7 K and AE = 13.4 ± 1.0 kcal/mol for LLM; = 
295.7±3.1 K and AE = 9.7±0.5 kcal/mol for beta3s). (b) Nativeness qhb against 
temperature. Two-state fits were not possible. 



NMR studies suggest that the folded population at 283 K is 0.09 for betanova (55), 
0.36 for LLM (55) , and 0. 13-0.3 1 for beta3s (56) . 

Figure 6 shows our Ehp and q^h data for these peptides. From the qhh data, 
values cannot be extracted, because the stability of the peptides is too low. At 
283 K, the <7hb-based folded populations are 0.08, 0.47, 0.28 for betanova, LLM 
and betaSs, respectively, in good agreement with the experimental results. Fits to 
Ehp data can be performed. The obtained Tm values are 318.8±2.5 K, 305. 6± 1.7 K 
and 295.7 ± 3. 1 K for betanova, LLM and beta3s, respectively. 

These Ehp -based Tm values are high compared to the experimentally deter- 
mined folded populations, especially for betanova. Note that betanova has a very 
low hydrophobicity. The correlation between Ehp and folding status is therefore 
likely to be weak for this peptide. 

In contrast to the Ehp -based folded populations, those based on qhb agree quite 
well with experimental data. In this respect, the situation is the opposite to what 
we found for the )8 -hairpins studied above. A possible reason for this difference 
is discussed below. 



AB zipper. The AB zipper is a designed heterodimeric leucine zipper, com- 
posed of an acidic A chain and a basic B chain, each with 30 residues (57). The 
dimer structure has been characterized by NMR, and a melting temperature of 
~ 340 K was estimated by CD measurements (at neutral pH) (57). 
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The lowest energy state seen in our simulations is a conformation in which 
pRMSD calculated over backbone atoms of all residues in both chains is ~ 2.7 A. 
In this structure, the bRMSD (all residues) of the individual chains A and B to 
their counterparts in the PDB structure are ~ 2.5 A and ~ 2.4 A, respectively. Un- 
like for the other systems described in this article, the boundary conditions have 
a non-trivial role for this dimeric system. A proper discussion of periodicity, con- 
centration and temperature dependence of this system is beyond the scope of 
this article. In Figure 7a, we show the energy landscape, i.e., the mean energy as 
a function of two order parameters for this system. The X-axis shows the measure 
pRMSD described earlier. The Y-axis represents the sum of the backbone RMSD 
of the individual chains. pRMSD can be very large even if the sum of bRMSDs 
is small: the two chains can be folded without making the proper inter-chain 
contacts. Indeed, the figure shows that the major energy gradients are along the 
Y-axis, showing that it is energetically favorable for both chains to fold to their 
respective helical states. The correct dimeric native state is energetically more 
favorable by ~ 20 kcal/mol compared to two folded helices without proper inter- 
chain contacts. This is seen more clearly in Figure 7b, where we plot the average 
energy as a function of pRMSD for states with two folded chains. We also sim- 
ulated the two chains A and B of the dimer in isolation. Both chains folded to 
their native helical conformations. The melting temperatures estimated based 
on helix content for chains A and B are 314 K and 313 K, respectively. As indicated 
above, for the dimer, thermodynamic parameters like Tm cannot be directly esti- 
mated from the present simulations. 

Top7-CFr. Top7-CFr, the C-terminal fragment of the designed 93-residue a/P - 
protein Top7 (58), is the most complex of all molecules studied here. It has both 
ff-helix and P -strand secondary structure elements, and highly non-local hydro- 
gen bonds between the N- and C-terminal strands. CFr is known to form ex- 
tremely stable homodimers, which retain their secondary structure till very high 
temperatures like 371 K and high concentrations of denaturants (59). 

In (13; 14), an earlier version of our model was used to study the folding of 
CFr. The simulations pointed to an unexpected folding mechanism. The N- 
terminal strand initially folds as a non-native continuation of the adjoining ff- 
helix. After the other secondary structure elements form and diffuse to an ap- 
proximately correct tertiary organization, the non-native extension of the helix 
unfolds and frees the N-terminal residues. These residues then attach to an ex- 
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(a) pRMSD (A) (b) pRMSD(A) 

Figure 8: The heterodimeric AB zipper, (a) Mean energy as a function of pRMSD over 
both chains and the sum of individual bRMSDs. The direction of the energy 
gradients implies that a system with two folded monomers is energetically fa- 
vorable compared to unfolded monomers. The proper dimeric form is the area 
closest to the origin, and has a lower energy, (b) Mean energy of all states in 
which both chains have bRMSD < 5 A, shown as a function of the dimer RMSD 
measure pRMSD. 

isting ^-hairpin to complete the three -stranded ^ -sheet of the native structure. 
Premature fastening of the chain ends in yS -sheet contacts puts the molecule in 
a deep local energy minimum, in which the folding and proper arrangement of 
the other secondary structure elements is hampered by large steric barriers. The 
above "caching" mechanism, spontaneously emerging in the simulations, accel- 
erates folding by helping the molecule avoid such local minima. 

The folding properties of CFr, including the above mentioned caching mecha- 
nism, are preserved under the current modifications of the interaction potential. 
The center of the native free-energy minimum shifts from bRMSD (all residues) 
of I.7A as reported in (13) to about 2.2 A. This state remains the minimum en- 
ergy state, although the new energy function changes the energy ordering of the 
other low energy states. The runs made for this study (see Table 6) found 22 inde- 
pendent folding events. The free-energy landscape observed in the simulations 
is rather complex with a plethora of deep local minima sharing one or more sec- 
ondary structure elements with the native structure. They differ in the registry 
and ordering of strands and the length of the helix. Longer runs are required for 
the MC simulations to correctly weight these different minima. Temperature de- 
pendence of the properties of CFr can therefore not be reliably obtained from 
these runs. 
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Figure 9: The three-helix-bundle protein GS-aaW. (a) Variation of histogram of bRMSD 
with temperature. At high temperatures, there is a broad distribution of 
bRMSD with values > lOA. At lower temperatures there are three clearly sep- 
arated clusters. Representative structures from these clusters are also shown 
(color) aligned with the native structure (gray), (b) Temperature dependence 
of specific heat, Cy , and the ratio hr of the observed helix content and the helix 
content of the native structure. 



We note that the simulations ran on twice as many processors but were only 
about one sixth the length of those used for (13), in which 15 independent folding 
events were found. The improved efficiency is partly due to the changes in the 
energy function presented here, and partly due to the optimization of the parallel 
tempering described in (26). 

GS-ff sW. GS-asWis a designed three-helix-bundle protein with 67 residues (60) , 
whose structure was characterized by NMR (61). The stability was estimated to 
be 4.6 kcal/mol in aqueous solution at 298 K, based on CD data (60). 

It turns out that this protein is very easy to fold with our model. Our results are 
based on extensive sampling of the conformation space with 64 x 3.5 x 10^ Monte 
Carlo updates, resulting in about 800 independent folding events to the native 
state. For this estimate, structures with bRMSD (all residues) under 5 A were 
taken to be in the native minimum (see Figure 8 for justification). Two visits to 
the native state were considered statistically independent (i) if they occurred in 
independent Markov chains, or (ii) if the two visits to the native state were sep- 
arated by at least one visit to the highest temperature in the simulation. For the 
entire run, we spent about 10 days of computing time on 64 AMD Opteron pro- 
cessors running at 2.0 GHz. 
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In Figure 8a, we show how the probabilities for structures with different bRMSD 
vary with temperature in the simulations. Clearly, the protein makes a transition 
from a rather continuous distribution of bRMSD at high temperatures to a distri- 
bution dominated by three well separated clusters. Analysis of the structures at 
the lower temperatures shows that all three free-energy minima consist almost 
exclusively of structures with all three helices of GS-aaW formed. The plot of 
the ratio of the observed helix content and the helix content of the native state, 
shown in Figure 8b, further supports this idea. The average value of this ratio 
approaches 1 as the temperature decreases below 300 K. The specific heat curve, 
also shown in Figure 8b, indicates that the formation of these structures corre- 
lates with the steepest change in energy. 

The cluster with a center at bRMSD ~3A dominates at the lowest tempera- 
tures. The structures contributing to the cluster with ~8-9A bRMSD superfi- 
cially look like well folded three-helix bundles. But as illustrated in the figure, the 
arrangement of the helices is topologically distinct from the native arrangement. 
The cluster seen at larger bRMSD values is broader and consists of a host of struc- 
tures in which two of the helices make a helical hairpin, but the third helix is not 
bound to it. The unbound helix could be at either side of the chain. 

According to our model therefore, the population at the lowest temperatures 
consists of ~ 80% genuinely native structures, ~ 10% three-helix bundles with 
wrong topology, and ~ 10% other structures with as much helix content as the 
native state. In order to experimentally determine the true folded population of 
the protein, the experimental probe must be able to distinguish the native fold 
from the other helix rich structures described here. 

DISCUSSION 

The model presented here is intrinsically fast compared to many other all-atom 
models, because all interactions are short range. By exploiting this property and 
using efficient MC techniques, it is possible to achieve a high sampling efficiency. 
We could, for example, generate more than 800 independent folding events for 
the 67-residue GS-asW. The speed of the simulations thus permits statistically 
accurate studies of the global free-energy landscape of peptides and small pro- 
teins. 

In developing this potential, a set of 17 peptides with 10-37 residues was stud- 
ied. The peptides were added to this set one at a time. To fold a new sequence 
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sometimes required fine-tuning of the potential, sometimes not. A change was 
accepted only after testing the new potential on all previous sequences in the set. 
In its final form, the model folds all 17 sequences to structures similar to their ex- 
perimental structures, for one and the same choice of potential parameters. 

Also important is the stability of the peptides. A small polypeptide chain is 
unlikely to be a clear two-state folder, and therefore its apparent folded popula- 
tion will generally depend on the observable studied. For p -sheet peptides, we 
used the hydrophobicity energy Ehp and the hydrogen bond-based nativeness 
measure qhb to monitor the melting behavior. The extracted values indeed 
showed a clear probe dependence; the ^hp -based value was always larger than 
that based on qhb- For the p -hairpins studied, we found a good overall agreement 
between our Ehp -based results and experimental data. For the three -stranded p- 
sheets, instead, the qhb results agreed best with experimental data. The reason 
for this difference is unclear. One contributing factor could be that interactions 
between aromatic residues play a more important role for the ^-hairpins stud- 
ied here than for the three-stranded )S -sheets. These interactions may influence 
spectroscopic signals and are part of Ehp. Probe-dependent Tm values have also 
been obtained experimentally, for example, for trpzip2 (53). 

The probe dependence makes the comparison with experimental data less 
straightforward. Nevertheless, the results presented clearly show that the model 
captures many experimentally observed stability differences. In particular, among 
related peptides, the calculated order of increasing thermal stability generally 
agrees with the experimental order, independent of which of our observables we 
use. 

It is encouraging that the model is able to fold these 17 sequences. However, 
there is no existing model that will fold all peptides, and our model is no ex- 
ception. Two sequences that we unsuccessfully tried to fold are the ^-hairpins 
trpzip4 and Uie, both with 16 residues. Trpzip4 is a triple mutant of GBlp with 
four tryptophans (52). For trpzip4, our minimum energy state actually corre- 
sponded to the NMR-derived native state (52), but the population of this state 
remained low at the lowest temperature studied (~ 14 % at 279 K, as opposed 
to an estimated 7^ of 343 K in experiments (52)). Uie is derived from the N- 
terminal -hairpin of ubiquitin (62). It has a shortened turn and has been found 
to form a ^-hairpin with non-native registry (62). In our simulations, this state 
was only weakly populated (~ 8% at 279 K, as opposed to an estimated ~ 80% 
at 288 K (62)). Instead, the main free-energy minima corresponded to the two 
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-hairpin states with the registry of native ubiquitin, one with native hydrogen 
bonds and the other with the complementary set of hydrogen bonds. 

Our calibration of the potential relies on experimental data with non-negligible 
uncertainties, on a limited number of peptides. It is not evident that this poten- 
tial will be useful for larger polypeptide chains. Therefore, as a proof-of-principle 
test, we also studied three larger systems, with very good results. Our simulations 
showed that, without having to adjust any parameter, the model folds these se- 
quences to structures consistent with experimental data. Having verified this, 
it would be interesting to use the model to investigate the mechanisms by which 
these systems self- assemble, but such an analysis is beyond the scope of this arti- 
cle. The main purpose of our present study of these systems was to demonstrate 
the viability of our calibration approach. 

The potential can be further constrained by confronting it with more accurate 
experimental data and data on new sequences. The challenge in this process 
is to ensure backward compatibility — new constraints should be met without 
sacrificing properties already achieved. 

CONCLUSION 

We have described and tested an implicit solvent all-atom model for protein sim- 
ulations. The model is computationally fast and yet able to capture structural 
and thermodynamic properties of a diverse set of sequences. Its computational 
efficiency greatly facilitates the study of folding and aggregation problems that 
require exploration of the full free-energy landscape. A program package, called 
PRO FAST (28), for single- and multi-chain simulations with this model is freely 
available to academic users. 
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